skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Walji, Muhammad F"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract BackgroundWhile most health-care providers now use electronic health records (EHRs) to document clinical care, many still treat them as digital versions of paper records. As a result, documentation often remains unstructured, with free-text entries in progress notes. This limits the potential for secondary use and analysis, as machine-learning and data analysis algorithms are more effective with structured data. ObjectiveThis study aims to use advanced artificial intelligence (AI) and natural language processing (NLP) techniques to improve diagnostic information extraction from clinical notes in a periodontal use case. By automating this process, the study seeks to reduce missing data in dental records and minimize the need for extensive manual annotation, a long-standing barrier to widespread NLP deployment in dental data extraction. Materials and MethodsThis research utilizes large language models (LLMs), specifically Generative Pretrained Transformer 4, to generate synthetic medical notes for fine-tuning a RoBERTa model. This model was trained to better interpret and process dental language, with particular attention to periodontal diagnoses. Model performance was evaluated by manually reviewing 360 clinical notes randomly selected from each of the participating site’s dataset. ResultsThe results demonstrated high accuracy of periodontal diagnosis data extraction, with the sites 1 and 2 achieving a weighted average score of 0.97-0.98. This performance held for all dimensions of periodontal diagnosis in terms of stage, grade, and extent. DiscussionSynthetic data effectively reduced manual annotation needs while preserving model quality. Generalizability across institutions suggests viability for broader adoption, though future work is needed to improve contextual understanding. ConclusionThe study highlights the potential transformative impact of AI and NLP on health-care research. Most clinical documentation (40%-80%) is free text. Scaling our method could enhance clinical data reuse. 
    more » « less
    Free, publicly-accessible full text available May 2, 2026